AITopics | sparse gradient

Collaborating Authors

sparse gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differentially Private Optimization with Sparse Gradients

Neural Information Processing SystemsMay-27-2025, 05:24:22 GMT

Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. The corresponding lower bounds are based on a novel block-diagonal construction that is combined with existing DP mean estimation lower bounds.Next, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Furthermore, by introducing novel analyses of bias reduction in mean estimation and randomly-stopped biased SGD we obtain nearly dimension-independent rates for near-stationary points for the empirical risk in nonconvex settings under approximate-DP.

differentially private optimization, dimension-independent rate, sparse gradient, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Grad Queue : A probabilistic framework to reinforce sparse gradients

Hasib, Irfan Mohammad Al

arXiv.org Artificial IntelligenceApr-25-2024

Informative gradients are often lost in large batch updates. We propose a robust mechanism to reinforce the sparse components within a random batch of data points. A finite queue of online gradients is used to determine their expected instantaneous statistics. We propose a function to measure the scarcity of incoming gradients using these statistics and establish the theoretical ground of this mechanism. To minimize conflicting components within large mini-batches, samples are grouped with aligned objectives by clustering based on inherent feature space. Sparsity is measured for each centroid and weighted accordingly. A strong intuitive criterion to squeeze out redundant information from each cluster is the backbone of the system. It makes rare information indifferent to aggressive momentum also exhibits superior performance with larger mini-batch horizon. The effective length of the queue kept variable to follow the local loss pattern. The contribution of our method is to restore intra-mini-batch diversity at the same time widening the optimal batch boundary. Both of these collectively drive it deeper towards the minima. Our method has shown superior performance for CIFAR10, MNIST, and Reuters News category dataset compared to mini-batch gradient descent.

arxiv preprint arxiv, batch size, gradient, (14 more...)

arXiv.org Artificial Intelligence

2404.16917

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Asymmetric Momentum: A Rethinking of Gradient Descent

Zhang, Gongyue, Zhang, Dinghuang, Zhao, Shuwen, Liu, Donghan, Toptan, Carrie M., Liu, Honghai

arXiv.org Artificial IntelligenceOct-3-2023

Through theoretical and experimental validation, unlike all existing adaptive methods like Adam which penalize frequently-changing parameters and are only applicable to sparse gradients, we propose the simplest SGD enhanced method, Loss-Controlled Asymmetric Momentum(LCAM). By averaging the loss, we divide training process into different loss phases and using different momentum. It not only can accelerates slow-changing parameters for sparse gradients, similar to adaptive optimizers, but also can choose to accelerates frequently-changing parameters for non-sparse gradients, thus being adaptable to all types of datasets. We reinterpret the machine learning training process through the concepts of weight coupling and weight traction, and experimentally validate that weights have directional specificity, which are correlated with the specificity of the dataset. Thus interestingly, we observe that in non-sparse gradients, frequently-changing parameters should actually be accelerated, which is completely opposite to traditional adaptive perspectives. Compared to traditional SGD with momentum, this algorithm separates the weights without additional computational costs. It is noteworthy that this method relies on the network's ability to extract complex features. We primarily use Wide Residual Networks for our research, employing the classic datasets Cifar10 and Cifar100 to test the ability for feature separation and conclude phenomena that are much more important than just accuracy rates. Finally, compared to classic SGD tuning methods, while using WRN on these two datasets and with nearly half the training epochs, we achieve equal or better test accuracy.

dataset, gradient, momentum, (16 more...)

arXiv.org Artificial Intelligence

2309.0213

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

SparDL: Distributed Deep Learning Training with Efficient Sparse Communication

Zhao, Minjun, Yin, Yichen, Mao, Yuren, Chen, Lu, Gao, Yunjun

arXiv.org Artificial IntelligenceApr-3-2023

Top-$k$ sparsification has recently been widely used to reduce the communication volume in distributed deep learning; however, due to Gradient Accumulation (GA) dilemma, the performance of top-$k$ sparsification is still limited. Several methods have been proposed to handle the GA dilemma but have two drawbacks: (1) they are frustrated by the high communication complexity as they introduce a large amount of extra transmission; (2) they are not flexible for non-power-of-two numbers of workers. To solve these two problems, we propose a flexible and efficient sparse communication framework, dubbed SparDL. SparDL uses the Spar-Reduce-Scatter algorithm to solve the GA dilemma without additional communication operations and is flexible to any number of workers. Besides, to further reduce the communication complexity and adjust the proportion of latency and bandwidth cost in communication complexity, we propose the Spar-All-Gather algorithm as part of SparDL. Extensive experiments validate the superiority of SparDL.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.00737

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adam

#artificialintelligenceNov-29-2022, 23:25:07 GMT

Training a neural network consist of optimizing stochastic functions in a high-dimensional space. In this context, finding a computationally and memory-efficient method with fast convergence properties is hard. On one hand, the objective functions encounter in Deep Learning are in most cases non-convex and non-stationary. On the other hand, gradients can be noisy and/or sparse. For efficient stochastic optimization, Adam, just like RMSProp and AdaGrad, rely only on first-order gradients.

gradient, sparse gradient, step size, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Ge, Keshi, Fu, Yongquan, Lai, Zhiquan, Deng, Xiaoge, Li, Dongsheng

arXiv.org Artificial IntelligenceOct-5-2021

Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system. Collective communication such as AllReduce has been widely adopted for the distributed SGD process to reduce the communication time. However, AllReduce incurs large bandwidth resources while most gradients are sparse in many cases since many gradient values are zeros and should be efficiently compressed for bandwidth saving. To reduce the sparse gradient communication overhead, we propose Sparse-Sketch Reducer (S2 Reducer), a novel sketch-based sparse gradient aggregation method with convergence guarantees. S2 Reducer reduces the communication cost by only compressing the non-zero gradients with count-sketch and bitmap, and enables the efficient AllReduce operators for parallel SGD training. We perform extensive evaluation against four state-of-the-art methods over five training models. Our results show that S2 Reducer converges to the same accuracy, reduces 81\% sparse communication overhead, and achieves 1.8$ \times $ speedup compared to state-of-the-art approaches.

allreduce, gradient, sparse gradient, (16 more...)

arXiv.org Artificial Intelligence

2110.0214

Genre:

Research Report > New Finding (0.86)
Research Report > Promising Solution (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Standardizing a Machine Learning Framework for Applied Research

#artificialintelligenceJun-29-2018, 02:57:48 GMT

Until now, the Machine Learning (ML) frameworks we've used at Borealis AI have varied according to individual preference. But as our applied team grows, we're finding that a preference-based system has certain shortcomings that have led to inefficiencies and delays in our research projects. As a result, we identified two main arguments in favour of standardizing a single framework for the lab. It has been our experience that independent frameworks do not often "play well" together. For example, a TensorFlow-based model applied to one research project would have to be rewritten in PyTorch for another project.

artificial intelligence, machine learning, pytorch, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Schaul, Tom, LeCun, Yann

arXiv.org Artificial IntelligenceMar-27-2013

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper minibatch parallelization, including reweighted updates for sparse or orthogonal gradients, improving robustness on non-smooth loss functions, in the process replacing the diagonal Hessian estimation procedure that may not always be available by a robust finite-difference approximation. The final algorithm integrates all these components, has linear complexity and is hyper-parameter free.

artificial intelligence, gradient, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1301.3764

Country: Oceania > Tonga (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Learning sparse gradients for variable selection and dimension reduction

Ye, Gui-Bo, Xie, Xiaohui

arXiv.org Machine LearningJul-1-2010

Variable selection and dimension reduction are two commonly adopted approaches for high-dimensional data analysis, but have traditionally been treated separately. Here we propose an integrated approach, called sparse gradient learning (SGL), for variable selection and dimension reduction via learning the gradients of the prediction function directly from samples. By imposing a sparsity constraint on the gradients, variable selection is achieved by selecting variables corresponding to non-zero partial derivatives, and effective dimensions are extracted based on the eigenvectors of the derived sparse empirical gradient covariance matrix. An error analysis is given for the convergence of the estimated gradients to the true ones in both the Euclidean and the manifold setting. We also develop an efficient forward-backward splitting algorithm to solve the SGL problem, making the framework practically scalable for medium or large datasets. The utility of SGL for variable selection and feature extraction is explicitly given and illustrated on artificial data as well as real-world examples. The main advantages of our method include variable selection for both linear and nonlinear predictions, effective dimension reduction with sparse loadings, and an efficient algorithm for large p, small n problems.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

1006.506

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback